Merged
Conversation
✅ Deploy Preview for splinterdb canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch fixes two deadlocks, also enabling threads to perform updates to splinterdb while also performing asynchronous queries.
The first deadlock was caused by trunk_merge_lookup acquiring two locks on the root. This deadlock prone because, if another thread begins to GC the root, it will set the writer bit on the root lock and then wait for the readers to clear out. But, if the query code has already acquired its first read lock on the root and attempts to acquire its second read lock after the writer bit is set, then it will deadlock with the GC thread.
The fix is to acquire a lock on the root only once.
The second deadlock could occur when a thread performing asynchronous queries performs an update which results in a GC. In this case, one of the thread's async queries may hold a read lock on a node-to-be-GCed, but the GC code will wait forever for that read lock to be released, causing the thread to deadlock with itself.
The fix in this case is to defer GC until there are no more readers. So there's now a queue of GC tasks in the trunk context. Any time GC reaches a node with a read lock, it stops and puts that node on the queue. Every trunk modification checks the queue and processes any GCs that are now available to perform.
Fixes #314 .